<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">

<HTML>
  <HEAD>
    <META name="generator" content=
    "HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
    <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

    <TITLE>BSim Database</TITLE>
    <LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
    <LINK rel="stylesheet" type="text/css" href="../../shared/languages.css">
    <META name="generator" content="DocBook XSL Stylesheets V1.79.1">
    <LINK rel="home" href="index.html" title="BSim Database">
    <LINK rel="up" href="index.html" title="BSim Database">
    <LINK rel="prev" href="index.html" title="BSim Database">
    <LINK rel="next" href="DatabaseConfiguration.html" title="Database Configuration">
  </HEAD>

  <BODY>
    <DIV class="chapter">
      <DIV class="titlepage">
        <DIV>
          <DIV>
            <H1 class="title"><A name="DatabaseOverview"></A>BSim Database</H1>
          </DIV>
        </DIV>
      </DIV>

      <DIV class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
        <H3 class="title">Quick Reference Links</H3>

        <DIV class="itemizedlist">
          <UL class="itemizedlist compact" style="list-style-type: disc;">
            <LI class="listitem"><A class="link" href="DatabaseConfiguration.html" title=
            "Database Configuration">Database Configuration</A></LI>

            <LI class="listitem"><A class="link" href="IngestProcess.html" title=
            "Ingesting Executables">Ingesting Executables</A></LI>

            <LI class="listitem"><A class="link" href="../BSimSearchPlugin/BSimSearch.html" title=
            "Querying a BSim Database">Querying a BSim Database</A></LI>

            <LI class="listitem"><A class="link" href="FeatureWeight.html" title=
            "Features and Weights">Features and Weights</A></LI>

            <LI class="listitem"><A class="link" href="CommandLineReference.html" title=
            "Command-Line Utility Reference">Command-Line Reference</A></LI>
          </UL>
        </DIV>
      </DIV>

      <DIV class="section">
        <DIV class="titlepage">
          <DIV>
            <DIV>
              <H2 class="title" style="clear: both"><A name="IntroOverview"></A>Overview</H2>
            </DIV>
          </DIV>
        </DIV>

        <P>Welcome to Ghidra's BSim (Behavioral Similarity) Database. This database technology is
        designed to allow reverse engineers to ingest metadata about previously analyzed binary
        executables to a central server or local database, which can then be queried in the 
        course of analyzing new,
        unknown, executables to quickly discover previously seen functions and libraries.</P>

        <P>The primary record ingested into the database describes a single function. The most
        novel aspects of the database are that:</P>

        <DIV class="informalexample">
          <DIV class="itemizedlist">
            <UL class="itemizedlist" style="list-style-type: disc;">
              <LI class="listitem">Queries are tolerant of variations in the compilation of the
              function.</LI>

              <LI class="listitem">All records are indexed for quick queries. (even for very large
              collections)</LI>
            </UL>
          </DIV>
        </DIV>

        <P>The primary feature set used for indexing a function is extracted from a concise
        description of the data-flow of the function, not the explicit encoding of the machine
        instructions. The data-flow description is a graph-based (abstract syntax tree)
        representation, based on Ghidra's intermediate representation language, p-code, and is
        generated by the Ghidra decompiler. The resulting function descriptions are normalized to
        minimize the impact of variations due to:</P>

        <DIV class="informalexample">
          <DIV class="itemizedlist">
            <UL class="itemizedlist" style="list-style-type: disc;">
              <LI class="listitem">Equivalent machine instructions</LI>

              <LI class="listitem">Storage location (registers, stack, memory)</LI>

              <LI class="listitem">Instruction order</LI>

              <LI class="listitem">Many forms of compiler transformation</LI>

              <LI class="listitem">Even some forms of deliberate obfuscation.</LI>
            </UL>
          </DIV>
        </DIV>

        <P>Records are indexed using current Text Retrieval strategies, which allow "nearest
        neighbor" queries. The feature set of an unknown function being queried does not have to
        exactly match the features of a "hit" in the database, but only a configurable percentage
        of them. This supplies an additional level of tolerance of "functional difference" on top
        of the tolerance of "functionally equivalent" variations provided by the decompiler. In
        other words, there can be some amount of true change in the underlying source code, and the
        query may still be able to find a match.</P>

        <P>Queries are quick: For a single function, results typically come back in microseconds,
        even for a database containing millions of functions.</P>
      </DIV>

      <DIV class="section">
        <DIV class="titlepage">
          <DIV>
            <DIV>
              <H2 class="title" style="clear: both"><A name="ToolOverview"></A>Overview of
              Tools</H2>
            </DIV>
          </DIV>
        </DIV>

        <P>A BSim Database is built on top of one of three technologies: PostgreSQL,
        local H2 database, or Elasticsearch.
        PostgreSQL is a robust, production capable, server that supports multiple simultaneous
        connections and is extremely fault tolerant. Elasticsearch is a scalable search engine that
        allows a database to be distributed across an entire cluster of machines. 
        The local H2 database support is provided for convenience and use with small personal
        collections.  For any of these options, this distribution includes specific reverse 
        engineering extensions and clients that provide the following capabilities.</P>

        <DIV class="informalexample">
          <DIV class="itemizedlist">
            <UL class="itemizedlist" style="list-style-type: disc;">
              <LI class="listitem">
                Integration with a Ghidra Server or local project: 

                <DIV class="itemizedlist">
                  <UL class="itemizedlist" style="list-style-type: circle;">
                    <LI class="listitem">Ingest can be with respect to a Ghidra repository
                    from either a Ghidra Server or local project.</LI>

                    <LI class="listitem">Query results can refer to executables within a
                    repository.</LI>

                    <LI class="listitem">Easy command-line ingests using the <CODE class=
                    "filename">bsim</CODE> command script</LI>
                  </UL>
                </DIV>
              </LI>

              <LI class="listitem">
                Client as a Ghidra Plug-in:

                <DIV class="itemizedlist">
                  <UL class="itemizedlist" style="list-style-type: circle;">
                    <LI class="listitem">Ghidra includes a plug-in client that integrates a query
                    dialog and results windows directly into the main code browser.</LI>
                  </UL>
                </DIV>
              </LI>

              <LI class="listitem">
                Query API:

                <DIV class="itemizedlist">
                  <UL class="itemizedlist" style="list-style-type: circle;">
                    <LI class="listitem">Ghidra includes a Java API to the BSim server so that
                    queries (and potentially ingest) can be incorporated into analyst scripts. The
                    API marshals queries and results between an active Ghidra session and a BSim
                    server.</LI>
                  </UL>
                </DIV>
              </LI>
            </UL>
          </DIV>
        </DIV>

        <DIV class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
          <H3 class="title">Note</H3>

          <P>The PostgreSQL server software is currently only supported for the <SPAN class=
          "emphasis"><EM>Linux</EM></SPAN> and <SPAN class="emphasis"><EM>macOS</EM></SPAN>
          architectures. Elasticsearch server software must be obtained separately. Small local
          file-based databases are supported on all platforms via an embedded H2 database
          engine.  The BSim client
          software is supported on all platforms and can connect to servers on a different
          architecture.</P>
        </DIV>
      </DIV>
    </DIV>
  </BODY>
</HTML>
